ENH: ujson better handling of very large and very small numbers, throw ValueError for bad double_precision arg #4042 #4299

Komnomnomnom · 2013-07-19T23:09:52Z

This makes ujson handle very big and very small numbers a bit better, it doesn't help with precision but it should at least be able to handle very small and large exponentials now:

In [4]: from pandas.json import dumps

In [5]: dumps(1e-5)
Out[5]: '0.00001'

In [6]: dumps(1e-6)
Out[6]: '0.000001'

In [7]: dumps(1e-7)
Out[7]: '0.0000001'

In [8]: dumps(1e-8)
Out[8]: '0.00000001'

In [9]: dumps(1e-9)
Out[9]: '0.000000001'

In [10]: dumps(1e-10)
Out[10]: '0.0000000001'

In [11]: dumps(1e-11)
Out[11]: '0.0'

In [12]: dumps(1e-11, double_precision=15)
Out[12]: '0.00000000001'

In [13]: dumps(1e-12, double_precision=15)
Out[13]: '0.000000000001'

In [14]: dumps(1e-13, double_precision=15)
Out[14]: '0.0000000000001'

In [15]: dumps(1e-14, double_precision=15)
Out[15]: '0.00000000000001'

In [16]: dumps(1e-15, double_precision=15)
Out[16]: '0.000000000000001'

In [17]: dumps(1e-16, double_precision=15)
Out[17]: '1e-16'

In [18]: dumps(1e-16)
Out[18]: '1e-16'

In [19]: dumps(1e-17)
Out[19]: '1e-17'

In [20]: dumps(1e-40)
Out[20]: '1e-40'

In [21]: dumps(1e-100)
Out[21]: '1e-100'

In [22]: dumps(1e-400)
Out[22]: '0.0'

In [28]: dumps(1e40)
Out[28]: '1e+40'

In [29]: dumps(1e100)
Out[29]: '1e+100'

In [30]: dumps(1e400)
Out[30]: 'null'

In [31]: from pandas.json import loads

In [32]: loads(dumps(1e100))
Out[32]: 1e+100

In [33]: loads(dumps(1e40))
Out[33]: 1e+40

In [34]: loads(dumps(1e-40))
Out[34]: 1e-40

I have also modified it to throw a ValueError when a bad value is given for double_precision:

In [25]: dumps(1e-400, double_precision=-1)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-25-e15fa4642646> in <module>()
----> 1 dumps(1e-400, double_precision=-1)

ValueError: Invalid value '-1' for option 'double_precision', max is '15'

In [26]: dumps(1e-400, double_precision=16)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-26-ab74b2f14c7f> in <module>()
----> 1 dumps(1e-400, double_precision=16)

ValueError: Invalid value '16' for option 'double_precision', max is '15'

Tested on Python 2.7 on Arch-64. T'would be great if someone could test this out on windows.

…w ValueError for bad double_precision arg pandas-dev#4042

trottier · 2013-07-19T23:51:35Z

How about for numbers that make full use of double precision? E.g. 1.234567890123456e-40

Also, I trust @njsmith 's opinion over my own on this ... :)

Komnomnomnom · 2013-07-19T23:58:33Z

It seems to work ok

In [1]: from pandas.json import dumps

In [2]: dumps(1.234567890123456e-40)
Out[2]: '1.23456789e-40'

In [3]: dumps(1.234567890123456e-40, double_precision=15)
Out[3]: '1.23456789012346e-40'

In [4]: dumps(1.234567890123456e-40, double_precision=16)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-02b1ef3af2eb> in <module>()
----> 1 dumps(1.234567890123456e-40, double_precision=16)

ValueError: Invalid value '16' for option 'double_precision', max is '15'

And at least it throws an error for an invalid precision setting now rather than silently capping it at 15.

trottier · 2013-07-20T00:15:21Z

I'm a little concerned about the rounding up of 5 to 6, here:

In [3]: dumps(1.234567890123456e-40, double_precision=15)
Out[3]: '1.23456789012346e-40'

Komnomnomnom · 2013-07-20T00:26:17Z

That'll be sprintf rounding things which is standard behaviour for it when given a precision.

I'm investigating the use of PyOS_double_to_string (thanks @njsmith) which looks like it should reproduce what simplejson does.

Komnomnomnom · 2013-07-20T00:29:39Z

Although simplejson has the same rounding behaviour, just to one more decimal place, so I'm inclined to stick with the changes above (i.e. sprintf) now:

In [9]: import json  # simplejson

In [10]: json.dumps(1.234567890123456e-40)
Out[10]: '1.234567890123456e-40'

In [11]: json.dumps(1.234567890123456789e-40)
Out[11]: '1.2345678901234568e-40'

jreback · 2013-07-20T01:38:58Z

@Komnomnomnom merge?

Komnomnomnom · 2013-07-20T01:44:29Z

I'm happy with it, and I don't have any other commits in the pipeline. There's probably scope for more improvements, but not in this PR.

ENH: ujson better handling of very large and very small numbers, throw ValueError for bad double_precision arg #4042

jreback · 2013-07-20T01:45:23Z

thank you sir!

ENH: ujson better handling of very large and very small numbers, thro…

1871002

…w ValueError for bad double_precision arg pandas-dev#4042

Komnomnomnom mentioned this pull request Jul 19, 2013

Ensure accurate encoding/decoding of big and small floats #4042

Closed

jreback mentioned this pull request Jul 19, 2013

TST: raise an error json serialization of floats that cannot be accurate represented #4295

Closed

Komnomnomnom mentioned this pull request Jul 20, 2013

ENH: expose ujson precise_float argument on decode #4300

Merged

jreback added a commit that referenced this pull request Jul 20, 2013

Merge pull request #4299 from Komnomnomnom/ujson-small-floats

ec8920a

ENH: ujson better handling of very large and very small numbers, throw ValueError for bad double_precision arg #4042

jreback merged commit ec8920a into pandas-dev:master Jul 20, 2013

Komnomnomnom deleted the ujson-small-floats branch July 20, 2013 01:46

jreback mentioned this pull request Jul 20, 2013

TST: test_encodeDoubleTinyExponential breaking on 32-bit #4306

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: ujson better handling of very large and very small numbers, throw ValueError for bad double_precision arg #4042 #4299

ENH: ujson better handling of very large and very small numbers, throw ValueError for bad double_precision arg #4042 #4299

Uh oh!

Komnomnomnom commented Jul 19, 2013

Uh oh!

trottier commented Jul 19, 2013

Uh oh!

Komnomnomnom commented Jul 19, 2013

Uh oh!

trottier commented Jul 20, 2013

Uh oh!

Komnomnomnom commented Jul 20, 2013

Uh oh!

Komnomnomnom commented Jul 20, 2013

Uh oh!

jreback commented Jul 20, 2013

Uh oh!

Komnomnomnom commented Jul 20, 2013

Uh oh!

jreback commented Jul 20, 2013

Uh oh!

Uh oh!

Uh oh!

ENH: ujson better handling of very large and very small numbers, throw ValueError for bad double_precision arg #4042 #4299

ENH: ujson better handling of very large and very small numbers, throw ValueError for bad double_precision arg #4042 #4299

Uh oh!

Conversation

Komnomnomnom commented Jul 19, 2013

Uh oh!

trottier commented Jul 19, 2013

Uh oh!

Komnomnomnom commented Jul 19, 2013

Uh oh!

trottier commented Jul 20, 2013

Uh oh!

Komnomnomnom commented Jul 20, 2013

Uh oh!

Komnomnomnom commented Jul 20, 2013

Uh oh!

jreback commented Jul 20, 2013

Uh oh!

Komnomnomnom commented Jul 20, 2013

Uh oh!

jreback commented Jul 20, 2013

Uh oh!

Uh oh!